我们应用磁共振波谱分析(MRS)深度学习(DL)数据的脑肿瘤检测的任务。医疗方面的应用往往是噪声数据匮乏和腐败困扰。这两个问题是在我们的数据集突出。此外,不同数量的光谱的可用于不同的患者。我们考虑的任务作为多实例学习(MIL)问题,解决这些问题。具体来说,我们聚集来自同一患者的多个光谱成“袋”用于分类和应用数据的增强技术。为了实现装袋的过程中,置换不变性,我们提出了两种方法:(1)申请MIN-,MAX-,和平均汇集所有样本在一个袋子和(2)的功能应用的注意机制。我们测试了多个神经网络结构这两种方法。我们证明上的多个实例的训练,而不是单一的光谱时分类性能显著提高。我们提出了一个简单的过采样数据隆胸方法,并表明它可以进一步提高性能。最后,我们证明了我们提出的模型优于根据大多数性能指标由神经放射学手工分类。
translated by 谷歌翻译
CNN-based surrogates have become prevalent in scientific applications to replace conventional time-consuming physical approaches. Although these surrogates can yield satisfactory results with significantly lower computation costs over small training datasets, our benchmarking results show that data-loading overhead becomes the major performance bottleneck when training surrogates with large datasets. In practice, surrogates are usually trained with high-resolution scientific data, which can easily reach the terabyte scale. Several state-of-the-art data loaders are proposed to improve the loading throughput in general CNN training; however, they are sub-optimal when applied to the surrogate training. In this work, we propose SOLAR, a surrogate data loader, that can ultimately increase loading throughput during the training. It leverages our three key observations during the benchmarking and contains three novel designs. Specifically, SOLAR first generates a pre-determined shuffled index list and accordingly optimizes the global access order and the buffer eviction scheme to maximize the data reuse and the buffer hit rate. It then proposes a tradeoff between lightweight computational imbalance and heavyweight loading workload imbalance to speed up the overall training. It finally optimizes its data access pattern with HDF5 to achieve a better parallel I/O throughput. Our evaluation with three scientific surrogates and 32 GPUs illustrates that SOLAR can achieve up to 24.4X speedup over PyTorch Data Loader and 3.52X speedup over state-of-the-art data loaders.
translated by 谷歌翻译